- To explore relationships in our data
- To communicate/present what we found
Updated “6/26/2020”
# install.packages("ggplot2")
# install.packages("tidyverse")
remotes::install_github("allisonhorst/palmerpenguins")
## Skipping install of 'palmerpenguins' from a github remote, the SHA1 (63696526) has not changed since last install. ## Use `force = TRUE` to force installation
library(ggplot2) library(palmerpenguins)
Note: I originally used the iris dataset, published by Ronald Fisher in the Annals of Eugenics in 1936. While the data themselves pertain to flowers, we cannot strip this data of it’s context, And given the joyous alternative of penguins and many others (use data() to check out all the other datasets available that are not iris), this is an easy thing to do.
data()
ggplot(data = penguins)
Creates a blank plot. We need to decide what variables we want to plot.
head(penguins)
## # A tibble: 6 x 7 ## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex ## <fct> <fct> <dbl> <dbl> <int> <int> <fct> ## 1 Adelie Torge… 39.1 18.7 181 3750 male ## 2 Adelie Torge… 39.5 17.4 186 3800 fema… ## 3 Adelie Torge… 40.3 18 195 3250 fema… ## 4 Adelie Torge… NA NA NA NA <NA> ## 5 Adelie Torge… 36.7 19.3 193 3450 fema… ## 6 Adelie Torge… 39.3 20.6 190 3650 male
Let’s look at the relationship between the variables bill_length_mm and bill_depth_mm.
## Let's try the bill_length_mm vs. the bill_depth_mm (x, y plot) ggplot(data = penguins, aes(x = bill_length_mm, y = bill_depth_mm))
Still blank! We need to decide what kind of plot to use.
We need to think about how best to visualize this data.
summary(penguins)
## species island bill_length_mm bill_depth_mm ## Adelie :152 Biscoe :168 Min. :32.10 Min. :13.10 ## Chinstrap: 68 Dream :124 1st Qu.:39.23 1st Qu.:15.60 ## Gentoo :124 Torgersen: 52 Median :44.45 Median :17.30 ## Mean :43.92 Mean :17.15 ## 3rd Qu.:48.50 3rd Qu.:18.70 ## Max. :59.60 Max. :21.50 ## NA's :2 NA's :2 ## flipper_length_mm body_mass_g sex ## Min. :172.0 Min. :2700 female:165 ## 1st Qu.:190.0 1st Qu.:3550 male :168 ## Median :197.0 Median :4050 NA's : 11 ## Mean :200.9 Mean :4202 ## 3rd Qu.:213.0 3rd Qu.:4750 ## Max. :231.0 Max. :6300 ## NA's :2 NA's :2
We need to think about what we want to show.
Quick excercise: try drawing with pen and paper what you think these plots might look like:
bill_length_mm and bill_depth_mmbill_length_mmbill_length_mm by speciesWe need to think about what we want to show.
Quick excercise: try drawing with pen and paper what you think these plots might look like:
bill_length_mm and bill_depth_mmbill_length_mmbill_length_mm by speciesHint Here are the options: Histogram - XY Scatter plot - Boxplot.
Geom = geometry
The ‘geom_’ functions choose what kind of graph we want to show:
## add the geom that we want ggplot(data = penguins, aes(x = bill_length_mm, y = bill_depth_mm)) + geom_point()
Is there anything you notice?
Using color, we want to differentiate the points by species.
## ggplot(data = penguins, aes(x = bill_length_mm, y = bill_depth_mm, color = species)) + geom_point()
Now try it with shape and fill.
## ggplot(data = penguins, aes(x = bill_length_mm, y = bill_depth_mm, shape = species)) + geom_point()
Need to trouble shoot a bit to see what the best option is for a given geom.
## try it outside ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm), color = species) + geom_point()
## add it to the geom ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm)) + geom_point(color = "red")
## ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, color = species)) + geom_point()
Try playing around with the aesthetics and see what you get!
## ggplot(data = penguins, aes(x = species, y = bill_depth_mm)) + geom_boxplot()
## The histogram ggplot(data = penguins, aes(x = bill_depth_mm)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Play around with the binwidth option to see what the best bin is.
## The histogram
ggplot(data = penguins, aes(x = bill_depth_mm, fill = species)) +
geom_histogram(binwidth = 0.25)
## The density plot ggplot(data = penguins, aes(x = bill_depth_mm, fill = species)) + geom_density()
## By fill ggplot(data = penguins, aes(x = bill_depth_mm, fill = species)) + geom_histogram(binwidth = 0.25)
## By facetting ggplot(data = penguins, aes(x = bill_depth_mm)) + geom_histogram(binwidth = 0.25) + facet_wrap(~ species)
base <- ggplot(data = penguins, aes(x = species, y = bill_depth_mm, color = species)) + geom_boxplot()
base +
xlab("Species") +
ylab("Bill Depth")
base +
xlab("Species") +
ylab("Bill depth") +
scale_color_manual(values = c("purple", "blue", "lightblue"), name = "Species of Penguin")
base + theme_bw()
Try theme_dark and theme_classic.
base <- ggplot(data = penguins, aes(x = species, y = bill_depth_mm, color = species)) +
geom_boxplot() +
xlab("Species") +
ylab("Bill depth") +
scale_color_manual(values = c("purple", "blue", "lightblue"), name = "Species of Penguin") +
theme_bw()
ggsave("base_plot.pdf", base, device = "pdf")
## Saving 7.5 x 4.5 in image